-
Notifications
You must be signed in to change notification settings - Fork 129
Added i18n component and related scripts #1082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…plits the text into manageable chunk
…xt handling and error reporting
…toring segment handling to use arrays for better management
…ling in Promise.all
…ity and consistency
… merging logic and remove unnecessary debug logs
…ager version in package.json
…lations inside SCHEME and SCHEMEINLINE tags
…ded try catch when parsing xml to json to report failed parsing possibly attributed to unsound xml structure.
- Created a new XML file for references (97references97.xml) containing a comprehensive list of references used in the SICP JS project. - Added a new XML file for the index preface (98indexpreface98.xml) to provide context and formatting for the index section. - Introduced a new XML file for the making section (99making99.xml) detailing the background, interactive features, and development history of the SICP JS project. - Updated subsection2.xml to close the previously open SUBSECTION tag and include a comment for clarity.
…LINE tags in XML parsing
breaking changes are made: xml repositories are divided into folders, currently consisting of en and cn folders to store translated content. The same applies to the json folder after running "yarn json". Frontend needs to be changed accordingly to fetch json files from the corresponding url |
i18n/controllers/translator.ts
Outdated
baseURL: process.env.AI_BASEURL | ||
}); | ||
|
||
const MAXLEN = Number(process.env.MAX_LEN) || 3000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MAXLEN
constant should be imported from the config.ts
file instead of being redefined here. The config.ts
already defines max_chunk_len
for the same purpose with the same default value (3000).
Having configuration values defined in multiple places leads to maintenance issues and potential inconsistencies if one location is updated but not the other.
Suggestion: Replace with import { max_chunk_len } from "../config";
and use that value directly.
i18n/controllers/loggers.ts
Outdated
console.log( | ||
`Summary log saved to logs/translation-summary-${timestamp}.log` | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log path is constructed differently in the code versus the message displayed to users:
// Actual path construction:
const logPath = path.join(
logDir,
`${translationSummaryPrefix}-${timestamp}.log`
);
// But message shows hardcoded format:
console.log(
`Summary log saved to logs/translation-summary-${timestamp}.log`
);
i18n/controllers/parsers.ts
Outdated
import { Readable } from "stream"; | ||
import { ignoredTags, max_chunk_len } from "../config"; | ||
import fs, { PathLike } from "fs"; | ||
import { FileLike } from "openai/uploads.mjs"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import import { FileLike } from "openai/uploads.mjs";
appears to be unused in this file. There are no references to the FileLike
type throughout the code.
i18n/controllers/parsers.ts
Outdated
import fs, { PathLike } from "fs"; | ||
import { FileLike } from "openai/uploads.mjs"; | ||
|
||
const MAXLEN = max_chunk_len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code creates an unnecessary alias for max_chunk_len
. Additionally, throughout the code there are instances of Number(MAXLEN)
which is redundant since max_chunk_len
should already be a Number type in the config file.
i18n/initializers/initialize.ts
Outdated
import fs from "fs"; | ||
import OpenAI from "openai"; | ||
import path, { dirname } from "path"; | ||
import Stream from "stream"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Stream
module is imported but never used.
translated section2 cn xmls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated
- name: Clone translated_xmls | ||
run: | | ||
git clone -b translated_xmls https://github.com/source-academy/sicp.git translated_xmls | ||
mv translated_xmls/* xml/ | ||
rm -r translated_xmls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we cloning from a specific branch instead of merging it to the default branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what these new workflows are for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workflow is for triggering translations for all those English XMLs that are changed. The workflow is not set to be triggered on pushes yet because the AI translation is not very stable yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the translate-everything workflow is for translating all English XMLs
echo API_KEY=${{ secrets.OPENAI_KEY }} >> .env | ||
# echo API_KEY=${{ secrets.OPENAI_KEY2 }} >> .env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they are no longer used, the secret should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the start of the project, me and Yi Hao were both given an API key, so there are two secrets. Now one is used as a backup.
echo API_KEY=${{ secrets.OPENAI_KEY }} >> .env | ||
# echo API_KEY=${{ secrets.OPENAI_KEY2 }} >> .env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI? |
How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch? |
They are the same. The translated_xmls branch is separately created because in the workflow for translation (translate-changed, translate-everything) an action called deploy which pushes files to a branch is used to push AI generated translations to GitHub, so the branch is created for this. |
I see, noted, and what is the rationale of running it in a workflow, as opposed to running it locally? |
sicp.sourceacademy.org is a static site, so the translated XMLs need to be deployed too, and thus we are storing them on GitHub. This is not really about the translation workflows. The translator can be run locally and then the generated content can be manually pushed to the translated_xmls branch. |
run yarn trans to start translation